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Abstract. There exist very lucid explanations of the combinatorial origins of rational and 
algebraic functions, in particular with respect to regular and context free languages. In the 
search to understand how to extend these natural correspondences, we find that the shuffle 
product models many key aspects of D-finite generating functions, a class which contains 
algebraic. We consider several different takes on the shuffle product, shuffle closure, and 
shuffle grammars, and give explicit generating function consequences. In the process, we 
define a grammar class that models D-finite generating functions. 



Introduction 

Generating functions of languages 

The (ordinary) generating function of a language L is tlie sum 

L(z)= 5]zH, 

wliere \w\ is tlie lengtii of tiie word. Tliis sum is a formal power series if there are finitely 
many words of a given length. In this case, we say the language is proper, and we can 
rewrite L{z) as L(z) = ^i{n)z'^, where i{n) is the number of words in L of length n. 
In the case where we have an unambiguous grammar to describe a regular language or 
a context free language, one can automatically generate equations satisfied by generating 
function directly from the grammar. These are the well known translations: 

L =Li + L2 =^ L{z) =Li{z) + L2{z) 
L =Li-L2 =^ L{z) =Li{z)L2{z) 
L =Ll =^ L{z) =(l-Li(2))-i. 

Generating functions of formal languages are now a very established tool for algorithm 
analysis (see [12] for many references) and increasingly for random generation In this 
context, we are also interested in the exponential generating function of a language. The 
two are related by the Laplace-Borel transform, however it is sufficient for our purposes to 
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think of the exponential generating function L{z) as the Hadamard product of L{z) and 

exp(^) = E If; that is, L{z) = E^H^- 

One spectacular feature of generating functions of languages is the extent to which their 
analytic complexity models the complexity of the language. Specifically, we have the two 
classic results: first, regular languages have rational generating functions, and second, those 
context-free languages which are not inherently ambiguous have an algebraic generating 
function. The context-free languages form a large and historically important subclass of all 
objects which have algebraic generating functions. Bousquet-Melou provides us [U [7] with 
an interesting discussion of the nature of combinatorial structures that possess algebraic 
and rational generating functions, including broad classes that are not representable as 
context-free languages. 

There remain unanswered questions related to other classes of languages, and other 
classes of functions. An example of the former is the question of Flajolet [10]: "In which 
class of transcendental functions do generating functions of (general) context free languages 
lie?" An example of the latter is the identification of languages whose generating functions 
are D-finitE- This is an exceptional class of functions [24], which, for the moment, lacks a 
satisfying combinatorial explanation. We survey some current understandings in Section [L3t 
and provide a language theoretic interpretation of one in Section 13.11 

To capture the analytic complexity of D-finite generating functions we should not expect 
a simple climbing of the language hierarchy (to indexed or context sensitive, say), as there 
are different notions of complexity in competition. For example the language {o^^^c" : n £ 
N} is difficult to recognize, but trivial to enumerate. Likewise, the generating function of 
the relatively simple looking language {z" : n G N} has a natural boundary at \z\ = 1, 
which is a trademark of very complex analytic behaviour. 

The shuffle product 

In the absence of the obvious answers, we consider a very common, and useful operator, 
the shuffle product, and discover that it fills in many interesting holes in this story. Consider 
the words uwi and VW2-, and the letters u, v G S. We define the shuffle product of two 
words recursively by the equation 

uwiiL\vw2 = u{wiinvw2) + v{uwiinw2)-, wiL\e = w; einw = w. 

Here the union is disjoint, and we distinguish duplicated letters from the second word by 
a bar: ama = {aa, aa\. Using the shuffle product we can define a class of languages with 
associated generating functions that form a class that strictly contains algebraic functions; 
it allows us to model a very straightforward combinatorial interpretation of the derivative 
(indeed in some interesting non-commutative algebras the shuffle product is even called a 
derivative); and it allows us to neatly consider some larger classes which are simultaneously 
more complex from the language and generating function points of view. 



D-finite, also known as holonomic, functions satisfy Unear differential equations with polynomial 
coefficients. 
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Goal and Results 

The aim of this study is two-fold. We hope that a greater understanding of generat- 
ing function implications of adding the shuffle product to context free languages provides 
insight to a larger class of combinatorial problems. The second goal is to understand the 
combinatorial interpretations of different function classes that arise between algebraic and 
D-finite. The shuffle is a natural combinatorial product to consider since it is, in some sense, 
a generalization of pointing. 

In the present work, we first examine the shuffle as an operator on languages, and in 
the second part we consider the shuffle as a grammar production rule to define languages. 
We show that the shuffle closure of the context free languages is D-finite; we give the 
asymptotic growth of coefficients of two classes using shuffle; we define a special pointing 
class that describes all D-finite functions; and discuss the shuffle closure of a language. 

In the next section we review interpretations of differential equations. This is followed 
by a discussion on the shuffle of languages, and some descriptions of shuffle grammars. 

1. Interpreting differential equations combinatorially 

1.1. The class of D-finite functions 

The class of D-finite functions is of interest to the combinatorialist for many reasons. 
The coefflcient sequence of a D-finite power series is P-recursive: it satisfies a linear recur- 
rence of fixed length with polynomial coefflcients, and hence is easy to generate, manipulate, 
and even "guess" their form. By definition, D-finite functions satisfy linear differential equa- 
tions with polynomial coefflcients, and thus it is relatively straightforward in many cases 
to perform an asymptotic analysis on the coefflcients, even without a closed form for the 
generating function. One important feature that we use here is that a P-recursive sequence 
grows asymptotically like 



where r,s,m,n,k G N, Q is a polynomial and X,uj,a, are complex numbers. We contrast 
this to the asymptotic template satisfied by coefflcients of algebraic functions: 



where n is an algebraic number and c? G Q \ { — 1, —2, . . . }. (A very complete source on 
the theory of asymptotic expansions of coefflcients of algebraic functions arising in the 
combinatorial context is \12\ Section VII. 4.1].) Notable differences include the exponential/ 
logarithmic factors, the power of a factorial, and the allowable exponents of n. 

We shall use the following properties of the D-finite functions: The function 1// is D- 
finite if, and only if, / is of the form exp{g)h, where g and h are algebraic [23]; The Hadamard 
product f X g = Yl fn9nZ^ of two D-finite functions / = ^ fnZ^ and g = ^ dn^^ is also 
D-finite. 



i{n) ~ A(n!)'^/''exp(Q(ni/"'a;"7i"(log7i)*^)) 
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1.2. The simplest shuffle: the point 

Pointing (or marking) is an operation that has been long studied in connection with 
structures generated by grammars. The point of an word w, denoted P{'w), is a set of words, 
each with a different position marked. For example, P{abc) = {abc, abc, abc} . Prom the 
enumerative point of view we remark that the two languages L, and Li = -P(L) = {P{w) : 
w £h}, satisfies the enumerative relation 

e^{7i) =n£(n), (1.2) 

and hence Li{z) = z-^L{z). The pointing operator is relevant to our discussion because of 
the simple bijective correspondence between P(L) and Llu a = {win a : w gL}. 

The first obvious question is, "does pointing increase expressive power?". In the case 
of regular languages and context free languages the answer is no; We can add a companion 
non-terminal for each non terminal that generates a language isomorphic to the pointed 
language. Let A be the pointed version of A. We add the following rules which model 
pointing: 

JAB) = AB + AB, {A + B) = A + B 
Remark how these rules resemble the corresponding product and sum rules for differen- 
tiation. Furthermore, from the point of view of generating functions, we know that the 
derivative of a rational function is rational again, and the derivative of an algebraic func- 
tion is again algebraic, and so we know immediately that we could not hope to increase the 
class of generating functions represented by this method. 

Pointing, when paired with a "de-pointing" operator which removes such marks, be- 
comes powerful enough to describe other kinds of constructions, namely labelled cycles 
and sets \13\ I15j . In this case we can describe set partitions, and which has exponential 
generating function exp(exp(z) — 1), which is not D-finite. 

It takes much more effort [5] to define a pointing operator with a differentiation property 
as in Eq. (ll.2p for unlabelled structures defined using Set and Cycle constructions. It is a 
fruitful exercise, as one can then generate approximate size samplers with expected linear 
time complexity. 

1.3. Other combinatorial derivatives 

Combinatorial species theory [2] provides a rich formalism for explaining the interplay 
between analytic and combinatorial representions of objects. In particular, using the vehicle 
of the the cycle index series, and there are several possibilities on how to relate them to 
(multivariate) D-finite functions |18ll21j . In this realm, given any arbitrary linear differential 
equation with polynomial coefficients we can define a set of grammar operators that allow 
us to construct a pair of species whose difference has a generating function that satisfies the 
given differential equation. Unfortunately at present we lack the intuition to understand 
what this class "is", specifically, we lack the tools to construct a test to see if any given 
class or language falls within it. 

In Section [3.41 we give a language theoretic interpretation of the derivative of a species; 
specifically a grammar system, from which, for any linear differential equation with co- 
efficients from Q[x] we can generate a language whose generating function satisfies this 
equation. 
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1.4. Other differential classes 

There are several other natural function classes related to the differential equations. A 
series f{z) G -^[[i]] is said to be constructible differentiably algebraic (CDF) if it belongs to 
some finitely generated ring which is closed under differentiation. OH]. This is equivalent 
to satisfying a system of differential equations of a given form. Combinatorially, any CDF 
function can be interpreted as a family of enriched trees. Theorem 3 of [3] gives the result 
that if ^an/nlt"" is CDF, then \an\ = 0(a"n!) for some complex constant a. This class is 
not closed under Hadamard product, and any arbitrary CDF function is unlikely to have 
the image under the Borel transform also CDF. This is the key closure property required 
for a meaningful correspondence with respect to the shuffle product. 

A larger class which contains both CDF and D-finite is differentiably algebraic. A 
function is differentiably algebraic (DA) if it satisfies an algebraic differential equation of 
the form P{x, y,y' , . . . , y^"'^) = where P is a non-trivial polynomial in its n + 2 variables. 
(See Rubel's survey [22] for many references.) 

The set of DA functions is closed under multiplicative inverse and Hadamard product. 
These two facts together are sufficient to prove that all of the classes we consider are 
differentiably algebraic. 

1.5. Generating functions and shuffles 

Generating functions are useful tool for the automatic studies of certain combinatorial 
problems. The shuffle operator has a straightforward implication on the generating function, 
as we shall see. 

With the aid of the shuffle product, Flajolet et al. [11] are able to perform a straightfor- 
ward analysis of four problems in random allocation. By using some systematic translations, 
they are able to derive integral representations for expectations and probability distribu- 
tions. As they remark, the shuffle of languages appears in several places relating to analysis 
of algorithms (such as evolution of two stacks in a common memory area). 

2. The shuffle of two languages 

The shuffle of two languages is defined as 



In order to use a generating function approach, we assume that Li is a language over the 
alphabet Si, and L2 is a language over S2, and Si n S2 = 0. If they share an alphabet, it 
suffices to add a bar on top of the copy from S2. 

2.1. The shuffle closure of context free languages 

We consider the shuffie closure of a language in the next section, and first concentrate 
on the shuffle closure of a class of languages. For any given class of languages C, the 
shuffle closure can be defined recursively as the (infinite) union of Sq, Si, ... , the sequence 
recursively defined by 




So 



C 



Sn = {L1LUL2 : Li G 5'„_i,L2 G C}. 
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The shuffle product is commutative and associative [20], and thus the closure contains 
SjLuSi, for any i and j. Remark, that for any given language in the closure, there is a 
bound on the number of shuffle productions that can occur in any derivation tree; namely, 
iiL £ Sn, that bound is n. 

In general, we denote the closure of a class of languages under shuffle as . The class 
of regular languages is closed under the shuffle product, since the shuffle of any two regular 
languages is regular. However, the context free languages are not closed under the shuffle 
product [20], and hence we consider its closure. 

The prototypical language in this class is the shuffle of (any finite number of) Dyck 
languages. Let {wla be count the number of occurences of the letter a in the word w. Let 
V be the Dyck language over the alphabet S = {u,d}: 

V = {w £ T,* : w'v = w =^ \w'\u > \w'\d and \w\u = \w\d-} 

We construct an isomorphic version £, over the alphabet {l,r}. 

The language DluS has encodes random walks restricted to the quarter plane with 
steps from u(p), d(own), r(ight), and l(eft) that return to the origin. By considering the 
larger language of Dyck prefixes, we can models walks that end anywhere in the quarter 
plane. Indeed, as the shuffle does preserve two distinct sets of prefix conditions, there are 
many examples of random walks in bounded regions that can be expressed as shuffles of 
algebraic languages. 

It might be interesting to consider other standard questions of classes of languages for 
this closure class; in particular if interesting random walks arise. 



2.2. The closure is D-finite 

In order to show that the shuffle product of two languages with D-finite generating func- 
tions also has a D-finite generating function, we consider the following classic observation 
on the enumeration of shuffles of languages. 

If L is the shuffle of Li and L2, then the number of words of length n in L are easily 
counted if the generating series for Li{z) = £i(n)z" and ^2(2) = i2in)z^ are known by the 
following formula: 

E 

ni+n2=n 

To see this, recognize that a word in L is a composed of two words, and a set of positions 
for the letters in the word from Li, This is equivalent to 



i{n)= >: ( ]h{ni)i2{n2). 
\n1n2' 



i{n) ^ y h{ni)l2{n2) 
n\ ^ n\\ n2l 

ni+n2=n 



(2.1) 



which amounts to the relation between the exponential generating functions of the three 
languages: 

L = LimL2 =^ L{z) = Li{z)L2{z). (2.2) 
Using these relations, we can easily prove the following result. 

Proposition 2.1. If L\ and L2 are languages with D-finite ordinary generating functions, 
then the generating series for L = Liin L2, L{z) is also D-finite. 

As is the case with many of the most interesting closure properties of D-finite functions, 
the proof follows from the closure of D-finite functions under Hadamard product |19j . 
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Proof. Since D-finite functions are closed under Hadamard product, the ordinary generating 
function is D-finite if and only if the exponential generating function of a sequence is D- 
finite. Consequently, if Li{z) and L2{z) are D-finite, then so are the exponential generating 
functions, Li{z) and L2{z). By closure under product, L{z) is D-finite, and thus so is L{z). ■ 

This result has the following consequences. 

Corollary 2.2. If Li and L2 are context free languages which are not inherently ambiguous, 
then the generating series L{z) for L = Liiu L2 is D-finite. 

Corollary 2.3. Any language in the shuffle closure of context free languages has a D-finite 
generating function. 

2.3. Asymptotic template for £{n) 

We continue the example from the previous section using the two Dyck languages 2? 
and £. It is straightforward to compute that D(z) = E{z) = ^ (^n)nTT'^"' ^C^)? the 

number of words of length n in the shuffle is given by 




We remark that an asymptotic expression for l[n) can be determined by first using the 
Vandermonde-Chu identity to simplify l{n): 

^<">=(l«/2j)(w21)' 

and then by applying Stirling's formula. Since l{n) ~ 4"/n, we see that it the resulting series 
is not algebraic. Flajolet uses this technique extensively in [10] to prove that certain context- 
free languages are inherently ambiguous. Thus, we have that our class has generating 
functions strictly contains the algebraic functions. 

Thus, we have some elements of a class of function with a nice asymptotic expansion. 
A rough calculation gives that that the shuffle of two languages, with respective asymptotic 
growth of Kjn'''(aj)"', for i = 1,2 respectively, is given by the expression 

i{n) ~ K?i^i+''2(ai + 02 - n - r2r. 

How could one hope to prove directly that all elements in this class have an expansion of 
the form 

i{n) ~ Ka^n^ , 

where now r can be any rational, and k is no longer restricted to algebraic numbers? It 
seems that it should be possible to prove this at least for the shuffles of series which satisfy 
the hypotheses of Theorem 3.11 [7], using a more generalized form of the Chu-Vandermonde 
identity, or for the closure of the sub-class of context-free languages posessing an N-algebraic 
generating function. In this case the d = —3/2, and this simplifies the analyses considerably. 
Unfortunately, it does not seem like a direct application of Bender's method [121 Theorem 
VI.2] applies. 

Theorem 13.21 states that the asymptotic form will not contain any powers of n\ greater 
than 2. This illustrates a limitation with the expressive power of the shuffle closure of 
context free languages: there are known natural combinatorial objects which have D-finite 
generating functions with coefficients that grow asymptotically with higher powers of n\. 
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For example, the number of fc-regular graphs for /c > 4 contains (n!)^/^, and the conjectured 
asymptotic for for fc-uniform Young tableaux contains 

3. Shuffle grammars 

We extend the first approach by allowing the shuffle to come into play earlier in the 
story; we add the shuffle operator to our grammar rewriting rules. Shuffle grammars as 
defined by Gischer [13] include a shuffle rule, and a shuffle closure rule. We consider these 
in Section 13.41 

As we did earlier, we first consider languages which have a natural bound on the number 
of shuffle productions that can occur in a derivation tree of any word in the language. That 
is followed by an example of a recursive shuffle grammar to illustrate how powerful they can 
be. It has been proven [T7] that the recursive shuffle grammars do indeed have a greater 
expressive power, but it is not always clear how to interpret the resulting combinatorial 
families. We begin with a second kind of pointing operator. 

3.1. A terminal pointing operator 

The traditional pointing operator can be used to model z-^, but one can show that 
this is, in fact, insufficient to generate all D-finite functions. To remedy this, we define 
a pointing operator which mimics the concept behind the derivative of a species. This 
pointing operator has the effect of converting a letter to an epsilon by 'marking' the letter. 
Consequently, a letter can not be marked more than once, and each subsequent time a 
word is marked, there is a counter on the mark which is augmented. The pointing operator 
applied a set of words will be the pointing operator applied to each of the elements of the 
set. Notationally, we distinguish them with accumulated primes. We give some examples: 

V{aah) = a' ah + aa'b + aah' 

V{V{aah)) = a'a"h + a' ah" + a" ah + aa'h" + a" ah' + aa"h' 

V{a"a'h") = 0. 

The length of the word is the number of unmarked letters in a word (but the combinatorial 
objects in the language encode more than just the length in some sense). The number of 
words in the pointing of a word is equal to its length. 

This gives a straightforward interpretation of the derivative: 

Li=P(L) =^ Li(z) = ^L(z). 

Using this definition if ^ is a symbol which 'yields' through a grammar a language 

Remark, if we allow concatenation after marking, we could generate two letters in the 

same word marked with a single prime via concatenation of marked words. 

Using the marking operation, we can express most D-finite functions, specifically, by the 

differential equations that they satisfy. For example, the series P{z) = Yln>o ^^--^^ satisfies 

the differential equation 

P{z) = l + zP{z) +z^P'{z). 
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This is modened by the grammar 

A^ aA 

A hcP{A). 

An alphabet on three letters (a, 6, c) allows us to track the origin of each letter. Here is the 
result of the third iteration of the rules: 

1 © a ® aa + ha'c ® aaa + abca' + hca'a + bcb"ca' + bcaa + bcbc'a © aaaa + aabca + abca'a. 

We will call a pointing grammar one that has rules of the form 

A^w, A^wB, A^V{B). (3.1) 

Despite the fact that we allow only left concatenation, (a strategy to avoid concatenating 
pointed words) these grammars rules can model any D-finite function. 

We can define a procedure for finding a language given a defining equation satisfied by 
a D-finite generating function. Say that a generating function T{z) satisfies 

T{z) = q{z) + qQ{z)T{z) + qi{z)T' {z) + ... + qn{z)T^''\z) . (3.2) 
Now substitute T[z) = P{z) — N[z) and 

(P(z)-iV(z)) =(/(z)+go(^)(^'(^)-iV(^))+'Zi(^)(^"(^)-iV'(^))+...+'Zn(^)(P(")(^)-iV("H^)) 
Use also the notation that qi{z) = qf{z) — q~{z) where qf{z) are the positive terms of 

the polynomial and q~{z) are the negative ones. 
Then if 

Piz) = q+iz) + qUz)Piz) + qoiz)Niz) + ■■■ + qU^)P^''\z) + q-{z)N^''\z) (3.3) 

and 

N{z) = q-{z) + q^{z)P{z) + q^{z)N{z) + ■■■ + q-{z)P^^\z) + qt{z)N^''\z) (3.4) 

then P{z) — N{z) satisfies equation (j3.2p . 

Now we can define a language with a rule for each monomial in (j3.3p and (j3.4p and 
every terms x^R^^\z) is represented by a rule of the form 

where V occurs k times and R, R are symbols representing a language whose generating 
function is either P{z) or N{z) and is a word of length a. 

Any language which is generated from rules of the form Eq. (j3.ip has a generating 
function which satisfies a linear differential equation, and hence is D-finite. 

We summarize this in the following theorem. 

Theorem 3.1. A language which is generated from the rules of the form Eq. (13. ip has a 
D-finite generating function. Moreover, any D-finite function can be written as a difference 
of two generating functions for languages which are generated by rules of this form. 
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3.2. Acyclic shuffle dependencies 

We consider languages generated by the fohowing re-writing rules, where w; is a word, 
and A, B and C are non-terminals: 

A^w, A^BC, A^BmC. (3.5) 

For any language generated by rules of the above type, and a fixed set of non-terminals, we 
construct the graph with non-terminals as nodes, and for every production rule A ^ BluC, 
we make an edge from A to B and an edge from A to C. If this graph is acyclic, we say 
the language has acyclic shuffle dependencies. The next section treats languages that have 
a cyclic dependency. 

We prove that this class of languages is larger than those generated by the pointing 
operator of the previous section, because we can generate a language with a generating 
function that is not D-finite. 

We re-use the Dyck languages T> and £ defined in Section [3.41 Consider the language 
generated by the following grammar: 

A Viu£ 
C l\AC. 

The shuffle dependency graph is a tree, and thus this is in our class. The generating 
functions of A and C are given by 

A{z) = ^ + ^-^^^ ~ EllipticK(4Vi) + — EllipticE(4Vi), C{z) = _\ 

Since \ — A{z) is not of the form exp{algebraic)algebraic, C{z) is not D-finite. Nonetheless, 
we can prove an asymptotic result about generating functions in this class. 

Theorem 3.2. Let L be a proper language generated by shuffle production in an unambigu- 
ous grammar of with rules of the form given in Eq. (j3.5p . on an alphabet with k letters. The 
number of words of length n, i{n), satisfies £{n) = 0(n!^). 

Proof. Since the grammar generates proper languages, there are no shuffle productions 
with epsilon. Thus, the derivation tree of a word of length n can have at most n shuffle 
productions. In the worst case, each one increments the alphabet and so the maximum size 
of alphabet that a word of length n can draw on is then kn. The total number of words 
from this alphabet is [kn)"^. 

For k < n the result follows by Stirling's formula. ■ 

3.3. Cyclic shuffle dependencies 

Languages in this class will have an infinite alphabet since we use a disjoint union in our 
shuffle. However, the number of words of a given length is finite if there is no derivation tree 
possible that is a shuffle and an e. Under this restriction, any word of length n comes from 
an alphabet using no more than more than a constant multiple of n letters. We consider 
an important class of this type in the next section. 
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3.4. The shuffle closure of a languages 

A class of languages which falls under this category are those that are generating using 
the shuffle closure operator. The shuffle closure of a language is defined recursively in the 
following way: L^^ = Llu L, and L^" = L^"~^Lij L. The shuffle closure, is the union over 
all finite shuffles: 

n 

Equivalently, we write this as a grammar production: A — > AujB\B. The shuffle clo- 
sure [161 [T7] provides extremely concise notation. In particular, they arise in descriptions 
of sequential execution histories of concurrent processes. 

Remark, that the closure of the language is one single language, whereas the closure of 
the class of languages that is one language is an infinite set of languages. 

The shuffle closure of a single letter gives all permutations: 

= a(Baa + aa(B aaa + aaa + alia + oaa + aaa + aaa © . . . 

The generating function of the this language is ^n!z", and indeed the generating 
function of the shuffle closure of any word of length k is '^{kn)\{^)"- , which is also D-finite. 

To prove our formula above, we express the generating function of in terms of the 
operators which switch between the ordinary and exponential generating functions. Recall, 
L{z) = Yl o-nz"" =^ L{z) = X] and we define the Laplace operator C ■ L{z) = L{z). 
Then, 

Li = L- =^ L,{z) = Y,(^-mz)T]- (3.6) 

n 

Although all of the summands are D-finite, it is possible that the sum is not. 

Clearly, the shuffle closure does not preserve regularity, and indeed adding it, and 
the shuffle product to regular languages is enough to generate all recursively enumerable 
languages. Thus, we see that if there is no bound on the number of shuffles possible in any 
expression tree, the languages can get far more complex. 

Nonetheless the following conjecture seems reasonable, and perhaps it is possible to 
prove it following starting from Eq. (j3.6p . and necessarily a more sophisticated analysis. 

Conjecture 3.3. The shuffle closure of a regular language has a D-finite generating func- 
tion. 

4. Conclusion 

A next step is to adapt the Bolzmann generators to these languages. Since we can effec- 
tively simulate labelled objects in an unlabelled context, we can easily describe objects like 
strong interval trees. This approach might allow a detailed analysis of certain parameters 
of permutation sorting by reversals, as applied to comparative genomics [T]. 

We are also interested in characterizing the context-free languages whose shuffle is not 
algebraic, and to consider the other naturual questions of closure that are standard for 
language classes. 

Acknowledgments. We gratefully acknowledge many discussions from the Algebraic Combinatorics 
Seminar at the Fields Institute, hi particular, we acknowledge contributions by N. Bergeron, C. 
HoUweg, and M. Rosas. We wish to also acknowledge the financial support of NSERC. 



572 



M. MISHNA AND M. ZABROCKI 



References 

[1] Severine Berard, Anne Bergeron, Cedric Chauve, and Chistophe Paul. Perfect sorting by reversals is 
not always difficult. IEEE/ACM Trans, on comput. biology and bioinformatics, 4(1), 2007. 

[2] F. Bergeron, G. Labelle, and P. Leroux. Combinatorial species and tree-like structures, volume 67 of 
Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 1998. 

[3] Frangois Bergeron and Christophe Reutenauer. Combinatorial resolution of systems of differential equa- 
tions. III. A special class of differentially algebraic series. European J. Combin., 11(6):501-512, 1990. 

[4] Frangois Bergeron and Ulrike Sattler. Constructible differentially finite algebraic series in several vari- 
ables. Theoret. Comput. Sci., 144(l-2):59-65, 1995. 

[5] Manuel Bodirsky, Eric Fusy, Mihyun Kang, and Stefan Vigerske. An unbiased pointing operator for 
unlabeled structures, with applications to counting and sampling. In Nikhil Bansal, Kirk Pruhs, and 
Clifford Stein, editors, SODA, pages 356-365. SIAM, 2007. 

[6] Mireille Bousquet-Melou. Algebraic generating functions in enumerative combinatorics, and context-free 
languages. In Stacs 05, volume 3404 of Lecture Notes in Comput. Sci., pages 18-35. Springer, 2005. 

[7] Mireille Bousquet-Melou. Rational and algebraic series in combinatorial enumeration. In International 
Congress of Mathematicians, pages 789-826, 2006. 

[8] Frederic Chyzak, Marni Mishna, and Bruno Salvy. Effective scalar products of D-finite symmetric 
functions. J. Combin. Theory Ser. A, 112(l):l-43, 2005. 

[9] Philippe Duchon, Philippe Flajolet, Guy Louchard, and Gilles Schaeffer. Boltzmann samplers for the 
random generation of combinatorial structures. Combin. Probab. Comput., 13(4-5) :577-625, 2004. 
[10] Philippe Flajolet. Analytic models and ambiguity of context-free languages. Theoret. Comput. Sci., 
49(2-3) :283-309, 1987. 

[11] Philippe Flajolet, Daniele Gardy, and Loys Thimonier. Birthday paradox, coupon collectors, caching 

algorithms and self-organizing search. Discrete Appl. Math., 39(3):207-229, 1992. 
[12] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. 

Ihttp: //algo . inria. fr /flajolet /Publications/books .html' 2006. 
[13] Philippe Flajolet, Paul Zimmerman, and Bernard Van Cutscm. A calculus for the random generation 

of labelled combinatorial structures. Theoret. Comput. Sci., 132(l-2):l-35, 1994. 
[14] Jay Gischer. Shuffie languages, petri nets, and context-sensitive grammars. Communications of the 

ACM, 24(9), September 1981. 
[15] Daniel Hill Greene. Labelled Formal Languages and Their Uses. PhD thesis, Stanford University, 1983. 
[16] Matthias Jantzen. Extending regular expressions with iterated shuffie. Theoret. Comput. Sci., 38(2- 

3):223-247, 1985. 

[17] Joanna J^drzejowicz. Infinite hierarchy of expressions containing shuffle closure operator. Inform. Pro- 
cess. Lett, 28(l):33-37, 1988. 

[18] Gilbert Labelle and Cedric Lamathe. A theory of general combinatorial differential operators. In Formal 
Power Series and Algebraic Combinatorics, 2007. 

[19] L. Lipshitz. The diagonal of a D-finite power series is D-&mte. J. Algebra, 113(2):373-378, 1988. 

[20] M. Lothaire. Combinatorics on words, volume 17 of Encyclopedia of Mathematics and its Applications. 
Addison- Wesley Publishing Co., Reading, Mass., 1983. 

[21] Marni Mishna. Automatic enumeration of regular objects. J. Integer Sequences, 10:Article 07.5.5, 2007. 

[22] Lee A. Rubel. A survey of transcendentally transcendental functions. Amer. Math. Monthly, 96(9) :777- 
788, 1989. 

[23] Michael F. Singer. Algebraic relations among solutions of linear differential equations. Trans. Amer. 

Math. Soc, 295(2) :753-763, 1986. 
[24] Richard P. Stanley. Enumerative combinatorics. Vol. 2, volume 62 of Cambridge Studies in Advanced 

Mathematics. Cambridge University Press, Cambridge, 1999. 



This work is licensed und er the Creative CQmmons Attribution-NoDerivs License . To view a 

copy of this license, visit http : //creativecommons . org/Iicenses/by-nd/3 .0/ 1 



